AMachine Learning Approach to Multiword Expression Extraction
نویسنده
چکیده
This paper describes our participation in the MWE 2008 evaluation campaign focused on ranking MWE candidates. Our ranking system employed 55 association measures combined by standard statistical-classification methods modified to provide scores for ranking. Our results were crossvalidated and compared by Mean Average Precision. In most of the experiments we observed significant performance improvement achieved by methods combining multiple association measures.
منابع مشابه
Automatic Identification of Bengali Noun-Noun Compounds Using Random Forest
This paper presents a supervised machine learning approach that uses a machine learning algorithm called Random Forest for recognition of Bengali noun-noun compounds as multiword expression (MWE) from Bengali corpus. Our proposed approach to MWE recognition has two steps: (1) extraction of candidate multi-word expressions using Chunk information and various heuristic rules and (2) training the ...
متن کاملUnsupervised Construction of a Lexicon and a Repository of Variation Patterns for Arabic Modal Multiword Expressions
We present an unsupervised approach to build a lexicon of Arabic Modal Multiword Expressions (AM-MWEs) and a repository of their variation patterns. These novel resources are likely to boost the automatic identification and extraction of AM-MWEs.
متن کاملAutomatic Extraction of Fixed Multiword Expressions
Fixed multiword expressions are strings of words which together behave like a single word. This research establishes a method for the automatic extraction of such expressions. Our method involves three stages. In the first, a statistical measure is used to extract candidate bigrams. In the second, we use this list to select occurrences of candidate expressions in a corpus, together with their s...
متن کاملA Machine Learning Approach for the Identification of Bengali Noun-Noun Compound Multiword Expressions
This paper presents a machine learning approach for identification of Bengali multiword expressions (MWE) which are bigram nominal compounds. Our proposed approach has two steps: (1) candidate extraction using chunk information and various heuristic rules and (2) training the machine learning algorithm called Random Forest to classify the candidates into two groups: bigram nominal compound MWE ...
متن کاملImproving effectiveness of mutual information for substantival multiword expression extraction
0957-4174/$ see front matter 2009 Elsevier Ltd. A doi:10.1016/j.eswa.2009.02.026 * Corresponding author. E-mail addresses: [email protected] (W. Zh Yoshida), [email protected] (X. Tang). One of the deficiencies of mutual information is its poor capacity to measure association of words with unsymmetrical co-occurrence, which has large amounts for multi-word expression in texts. Moreover, thre...
متن کامل